Reporting on Covid-19 in Arkansas: Data Journalism
Jour 5283, Fall 2020
Remote Delivery, Monday-Wednesday 9:40 a.m.-10:55 a.m.
Rob Wells, Ph.D.
rswells@uark.edu
@rwells1961
Agenda:
--Discuss syllabus
--Blackboard site
--Teams
--Arkansascovid.com
--Intro R and R Studio. Open program.
--Read Machlis, Ch. 2
--Advanced learners, see below
--Intellectual Property / Data Sharing Releases
Teams
For the first class, please have Teams installed so we can do some exercises.
Please download and activate Teams.
Teams is free through your university Office365 account. Download the Teams App through the Office365 suite https://its.uark.edu/communication-collaboration/office365/office365-desktop-apps.php
R and R Studio
Install R and R Studio.
This is free and open source software. It is not large and doesn't tax the memory a lot.
R runs on Windows, Mac and Linux, but this course is designed for the Mac version.
If you use Windows, there may be variations in the lessons and instructions. Please see me for questions.
Installing R is a two-step process:
1) Install R, the actual program
2) Install RStudio, a common interface
1) Download the most recent version of R for Mac:
https://mirrors.nics.utk.edu/cran/bin/macosx/R-4.0.2.pkg
--If you have a Windows computer, go to:
https://mirrors.nics.utk.edu/cran/bin/windows/base/R-4.0.2-win.exe
Accept all of the default settings for Mac.
2) Install RStudio, the interface we use to manage and create R code. Download the open source edition of R Studio desktop and follow the prompts to install it.
https://rstudio.com/products/rstudio/download/#download
> [**Good instructions for installing R**](http://www.machlis.com/R4Journalists/download-r-and-rstudio.html){target="_blank"}
> [**Good overview of the program**](https://docs.google.com/presentation/d/1O0eFLypJLP-PAC63Ghq2QURAnhFo6Dxc7nGt4y_l90s/edit#slide=id.p){target="_blank"}
Intellectual Property / Data Sharing Releases
UofA Rules of Conduct
https://docs.google.com/document/d/1YkdkRIzIs1WQ3P9KIICvHcppfWvwTyo2bRhGwQPsgVE/edit
License Agreement
https://docs.google.com/document/d/1AahzxDOzTf9Z6PBjBvFBOjnn9_BiM4YldnXW-mHZr9s/edit
Teams Videos
Microsoft Teams allows us to easily share information through the class or in discrete groups.
Chat in Teams
https://www.microsoft.com/en-us/videoplayer/embed/RE4rLgJ?pid=ocpVideo5-innerdiv-oneplayer&postJsllMsg=true&maskLevel=20&market=en-us
Create a post
https://www.microsoft.com/en-us/videoplayer/embed/RE2BIrO?pid=ocpVideo0-innerdiv-oneplayer&postJsllMsg=true&maskLevel=20&market=en-us
How to tag a person in Teams
https://www.microsoft.com/en-us/videoplayer/embed/RWkJ9C?pid=ocpVideo0-innerdiv-oneplayer&postJsllMsg=true&maskLevel=20&market=en-us
Machlis, Sharon. Practical R for Mass Communications and Journalism. Chapman & Hall/CRC The R Series. 2018. ISBN 9781138726918 https://www.amazon.com/gp/search?keywords=9781138726918
Chapter 2: Get Started With R in a Few Easy Steps
Machlis, Ch. 1, Introduction; Ch. 3, See How Much You Can Do in a Few Lines of Code
XXX VIDEO XXXX
Advanced Corner
--Import this data and summarize Washington County trends
| ##Wednesday, Aug. 26 |
| Agenda |
| Arkansascovid.com data Teams Installation issues on laptops? IRE Conference IRE20 Conference will be Sept. 21-25 R interface explained Exercises and tutorial |
| Basic R exercise |
| –Left click on the link, remove .txt extension, save as all files |
| https://github.com/profrobwells/CovidFall2020/blob/master/Exercises/Intro%20to%20R%208-20-2020.Rmd |
| Ch 1 & 2 of Machlis: Key Points |
| Reproducible research Repetitive tasks in modern newsrooms. Employment reports, crime stats, budgets Variables - an R object Assignment operator <- Case sensitive Vector: A vector can only have one type of data - all integers, all strings Dataframe - like a spreadsheet Save files - Don’t save workspace: because all of your variables will be stored and re-loaded the next time you launch RStudio. It’s too easy to forget about previously stored variables that can interfere with later work, |
| Software packages: tidyverse, rio, pacman |
| Data Types and R |
| Machlis: 2.4.2 Data types you’re likely to use often |
| IRE Conference IRE20 Conference will be Sept. 21-25* |
| Fellowship: https://www.ire.org/events-and-training/conferences/2020-ire-conference/ire20-fellowships-scholarships |
| #### Reading Before Monday’s Class |
| Wong, Dona M. The Wall Street Journal Guide to Information Graphics. W. W. Norton & Company. 2013. ISBN 0393347281. https://www.amazon.com/Street-Journal-Guide-Information-Graphics/dp/0393347281 Ch. 1: The Basics |
| Arkansascovid.com |
Agenda
Arkansascovid.com data
Building Your Own R Markdown Files
Tuesday Quiz on Basic R functions
Exercise
Arkansascovid.com exercise Lesson #1
Download this tutorial to work with Arkansascovid.com data
https://github.com/profrobwells/CovidFall2020/blob/master/Exercises/Arkansas%20Covid%20First%20Lesson%208-20-2020.Rmd
See Blackboard:
https://learn.uark.edu/webapps/assessment/take/launchAssessment.jsp?course_id=_276555_1&content_id=_8816418_1&mode=cpview
Machlis. Ch 4: Import Data into R
Cohen, "Numbers in the Newsroom," Common Mistakes.
Agenda
Discuss Quiz on Basic R functions
Arkansascovid.com exercise Lesson #1
Discuss Ch 4, Machlis: See Notes
Ch 3 & 4 of Machlis: Key Points
Ch 3 Exercises: Stock chart exercise used quantmod is a library for financial analysis. Median Income for a City Loading packages
Ch 4 Importing Data How read.table() works for importing data:
Loading data Manipulating data: dplyr - stringr Data Management: mutate rename bind_rows
Machlis, Ch. 5: Basic Data Exploration
Beginner's guide to R:
https://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html
Happy Labor Day! No Class
Machlis, Ch. 6: Beginning data visualization
Agenda
Describe Assignment #1
Continue with R tutorial
Exercise: Loading Data from U.S. Census & Student Loans
Assignment #1
Due Sept. 14: Managing Data / Static Graphic
Static Graphic - Managing Data in R.
Students will use R Studio to gather, analyze and visualize Arkansascovid data by demographic for Arkansas and report and write a 600 word story.
Exercise
Downloading Data 3rd Lesson 3 8-21-2020.RmdWong, Dona M. The Wall Street Journal Guide to Information Graphics. Ch. 2: Chart Smart
Charts_with_ggplot by Andrew Ba Tran,
Agenda
Data Visualization
ggplot2 - charts and maps
Export Static chart
Discuss Ch 5, Machlis: See Notes
Students will use R Studio to gather, analyze and visualize Arkansascovid data by demographic for Arkansas and report and write a 600 word story.
GGPLOT
A handy explanation of ggplot and its components
If you’re using ggplot: plus it!
For everything else: pipe it!
geom_point()
geom_bar()
geom_boxplot()
Data Visualization Intro
Load tutorial: Basic Data Visualization 12-26-18.R
Wong, Dona M. The Wall Street Journal Guide to Information Graphics.
Ch. 3 & 4: Ready Reference and Tricky Situations
Basic Charts in R
https://www.youtube.com/watch?v=1EUJ0tsVsUA&t=12s
Agenda
Data Visualization
ggplot2 - charts and maps
> [**GGplot Video from Andrew Ba Tran**](https://www.youtube.com/watch?v=Sx7d7eGRSj0&t=9s){target="_blank"}
–
Exercise
Graphing in GGPlot
https://bit.ly/2Gqjfj4
Multiple variable in a graph Geom_Line, Geom_point, Geom_bar How to alter the colors in a chart.
Quiz - Math and R
TK TK TK Test Canvas: Excel Quiz
Wong, Dona M. The Wall Street Journal Guide to Information Graphics.
Ch. 5: Charting Your Course
Albert Cairo, "The Functional Art," Principles of Data Visualization.
Agenda
Review Assignment #1
Assignment1_KEY_StaticGraphic_2_9.R
Dplyr bootcamp.
IRE Conference
**Dplyr Presentation*
Five basic verbs filter() select() arrange() mutate() summarize() plus group_by()
Pipes - a Much-Used Command to Link Filters, Functions
pipe %>% CMD + Shift + M
Pipes are a way of chaining commands.
object %>% operation() —> result
Presentation from Bob Rudis on Writing Readable Code with Pipes, delivered at the rstudio::conf 2017.
https://www.rstudio.com/resources/videos/writing-readable-code-with-pipes/
Key Concepts - Moving Forward:
Dplyr: Filters, Grouping, Sorting, pipes %>%
Pipe shortcut = CMD + SHIFT + M Basic data visualization
Tidyverse
Exercise:
DPLYR BOOT CAMP 5th Lesson 8-21-2020.Rmd
Make a Dplyr Cheat Sheet, Hand in on Blackboard by 11:59 pm Monday.
You are in Dplyr bootcamp!
Machlis: Ch. 8 Analyze data by groups
Transforming and Analyzing Data dplyr.pdf, Andrew Ba Tran, Washington Post
Dplyr - Andrew Ba Tran - pipes-dplyr.pdf
Agenda
IRE Conference
Dplyr boot camp
There is nothing else. Focus!!!
DPLYR
DPLYR BOOT CAMP 5th Lesson 8-21-2020.Rmd
Notes: How Do I?
https://smach.github.io/R4JournalismBook/HowDoI.html
Machlis
Ch. 7 Two or more data sets
Ch. 13 Date calculations
Dealing-with-dates.pdf by Andrew Ba Tran
https://github.com/profrobwells/Data-Analysis-Class-Jour-405v-5003/blob/master/Readings/dealing-with-dates.pdf
Agenda
Lubridate
Review Ch. 13 Machlis
Review Tran and Lubridate
Exercises
–Using Lubridate
The exercise
Lubridate_Intro_Feb_20.R
https://bit.ly/2H07YpX
Lubridate vignette
browseVignettes("lubridate")
What we will produce
Machlis
Ch. 9 Graphing by Group
Ch. 11 Maps in R
Agenda
Lubridate
Exercises
--Key to Lubridate Questions in Exercise:
https://bit.ly/2BO92d1
Machlis Ch. 12 Putting it all Together: R on Election Day
Agenda
Mapping
Exercises
Mapping Exercise from Machlis book, Ch. 11
https://bit.ly/2VXCSU2
Agenda
Andrew Ba Tran - Week 4 Mapping
http://learn.r-journalism.com/en/mapping/
Data Cleaning
Disaggregating variables for summation
Agenda
Video of Machlis mapping
https://www.youtube.com/watch?v=HFJOV5XaU_U
See R script:
Maps in R March 24 2019
https://bit.ly/2FvZDrB
###Assignment 2. Graphic with Multiple Data Sources. Due Oct 12
Students will use R Studio to gather, analyze and visualize Arkansascovid data by demographic for Arkansas using Census data or school district data. Results will be posted on GitHub. 600 Word story. Data dictionary required Students will produce publication-ready graphics from data.
Details:
With the assigned dataset, XXXX.csv, students will produce the following tables:
--Most common words used in data text field
Based on this information, write a 600 word story, following AP style, that describes potential newsworthy trends.
By 11:59 pm., you will submit the following on Blackboard:
XXX charts in .jpeg format: 1) Common words 2) Common hashtags 3) Date trend chart
One Google Doc with your findings, 600 words. Append at the end a brief data dictionary describing the Twitter data fields you used in the assignment.
An R script with the coding that shows how you loaded and cleaned the data and produced the charts
Notes
Answer Key discussion
top_n function makes life easy
aes - reorder in ggplot
–Visual Narrative Tricks by Albert Cairo https://www.youtube.com/watch?v=TSGaueL4Ggk
Agenda
Adapt Machlis - Maps in R Ch 11 exercise for Median Income in Arkansas
Maps in R March 24 2019 - Using the Census API
https://bit.ly/2FvZDrB
Building a Census tract map
We will build this
TK TK Images/Income by Census Tract Washington Co.png)
Exercises
Adapt Machlis - Maps in R Ch 11 exercise for Median Income in Arkansas
[You will make this in the class]TK TK Images/ARmed_income3-22-19.png
Misc Items
Check what software packages are running: Global Environment
^ + shift + 8 = Zoom to Environment
Machlis Chs. 15 & 16 APIs - basics
https://medium.com/@LewisMenelaws/a-beginners-guide-to-web-apis-and-how-they-will-help-you-23923a0da450
Agenda
Sign up for a census key: https://api.census.gov/data/key_signup.html
What is an API?
A gentle introduction to APIs for data journalists:
Agenda
Machlis Chs. 17 & 18
Census Reporter to look up tables https://censusreporter.org/
Agenda
--Max Harlow Presentation on How to Use GitHub
https://docs.google.com/presentation/d/1MbltRcOerktc-E26HMDjYj0BO9CTubQWu1Z2bB9CpVY/edit#slide=id.g448ccc227721fe56_10
“Connecting the Dots” by Jacob Harris (2015) and discuss how people should or should not be represented through news visualizations.
Agenda
Resources:
This class is intended to teach you modern workflow techniques for coding. A centerpiece of that workflow is GitHub. This is a website with a system that allows you to collaborate with other programmers on coding projects. It manages versions of software code and is a very popular with the tech elite.
Your GitHub account, which is public, represents an important professional image. Prospective employers and collaborators will look at your GitHub account.
--Create a GitHub account.
https://github.com/
--Simplified GitHub- GitHub Desktop
https://help.github.com/en/desktop
Exercises
See Basic GitHub 4-22-19.R https://bit.ly/2UAMGTd
Installing Git for a Mac - Andrew Ba Tran
--Follow this tutorial
https://guides.github.com/activities/hello-world/
Agenda
–Setting up an R Workflow http://learn.r-journalism.com/en/publishing/workflow/r-projects/
Resources on GitHub
–GitHub flow
https://guides.github.com/introduction/flow/
–GitHub Guides
https://guides.github.com/
–Another GitHub guide
https://andrewbtran.github.io/NICAR/2018/workflow/docs/03-integrating_github.html
Students will use R Studio to build interactive maps of Arkansascovid occupational data in Arkansas. Results will be posted on GitHub. Data dictionary required
Resources
Sentiment analysis:
Exercises
Joins in R:
https://bit.ly/2OFnGJ6
TAKE A DEEP BREATH, AMERICA
Agenda
Joins in R
Tidy text mining
https://www.tidytextmining.com/tidytext.html#
Bad data visualizations. Data Translation.
The Journalist as Programmer: A Case Study of The New York Times Interactive News Technology Department http://isoj.org/wp-content/uploads/2016/10/ISOJ_Journal_V2_N1_2012_Spring.pdf
What is code? http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/
Agenda
Exercises
Julia Angwin, Terry Parris Jr., Surya Mattu. “Breaking the Black Box: What Facebook Knows About You,” ProPublica, 2016;
Nicholas Diakopolous, “Algorithmic Accountability,” Digital Journalism, 2014.
Agenda
Julia Angwin article
Bigrams
Exercises
Agenda
Coulter Bigrams - Relationship
TK TK Images/Coulter Bigram Graphic.jpeg)
Exercise
Agenda
Amy Webb future of journalism trends
https://futuretodayinstitute.wetransfer.com/downloads/0e84e883e140bafe9a3436a6464032be20171003123607/ecda17
Google search tips
https://blog.expertisefinder.com/top-6-google-search-tips-for-journalists/
Artificial intelligence in the news
https://aiethicsinitiative.org/news/2019/3/12/artificial-intelligence-and-the-news-seven-ideas-receive-funding-to-ensure-ai-is-used-in-the-public-interest
Sharon Machlins Nicar compilation site
http://www.machlis.com/nicar19.html
Agenda
Review quiz
Bigrams
Simple Web Scraping
GitHub
Coulter Bigrams - Score
TK TK Images/Coulter bigram score.jpeg)
Exercises
Bigrams:
http://bit.ly/bigramz
Web Scraping in R: Simple Web Scraper
http://bit.ly/scrapeme
Agenda
OFF - THANKSGIVING
Agenda
GitHub
###Assignment 4. Interactive Data Visualization. Due Nov 30. Students will use R Studio to build interactive graphics / maps of Arkansascovid data by school district. 600 Word Story. Results will be posted on GitHub. Data dictionary required
Important!
Course Evaluation
Please do me a favor and evaluate this course.
It's important to me and the department to get your thoughts
on what worked and what did not.
I was very happy with how things turned out this semester and
intend to offer this course again. If you think it is important,
then please take five minutes to fill out the survey.
https://courseval.uark.edu/
Agenda
R Markdown
http://bit.ly/2DBLSaX
Turn your R cheatsheet into a PDF
Turn your R cheatsheet into a web page on GitHub
Agenda
Agenda
Congratulations!
We covered a lot this semester
Code in Fun
Questions on mapping exercises
Joins in R: https://bit.ly/2OFnGJ6
What is Sentiment Analysis
What is TidyText
Homework
Discuss Kavanaugh text mining story
Complete Coulter Tweet Analysis #2 exercises
–EXERCISES: Excel vs R
–Processing and counting hashtags
–Solving Problems in R: Bend Templates to Your Will
–Analyzing data –Finding Narratives
–#1: Solve the chronology problem with the chart
TK TK Images/AOC Year-Months.png)
–#2: Splitting Hashtags
Questions on Exercise
Splitting Hashtags 2-25-19.R
https://bit.ly/2BQIE2i
Resume AOC-Coulter Data Mining Exercise:
Coulter Tweet Analysis #2 Exercise
https://bit.ly/2TdY9Mv
Key to the exercise:
https://bit.ly/2F3brlh
Homework
Extracting Text Strings from data
https://bit.ly/2X5V9jL
What we will produce
TK TK Images/Top Words in AOC Feed.png)
Work through this exercise and bring questions to class:
Splitting Hashtags 2-25-19.R
https://bit.ly/2BQIE2i
--Graphing GGplot 12-28.R
Exercises from Machlis Ch. 9. Facets
Ch 3 Exercises:
Stock chart exercise used quantmod is a library for financial analysis.
dygraphs creates *interactive Web graphics* of data over time.
Reading
Resources:
Review Assignment #2
Coulter Tweet Analysis #2 Exercise
https://bit.ly/2TdY9Mv
Bots
Bot or Not: Difficulty determining a bot on Twitter
--An app that uses machine learning to guess if a Twitter account is a bot
https://www.r-bloggers.com/botrnot-an-r-app-to-detect-twitter-bots/
https://mikewk.shinyapps.io/botornot/
--Article about Botometer
https://www.vox.com/technology/2018/4/9/17214720/pew-study-bots-generate-two-thirds-of-twitter-links
--Stanford research paper on this topic
https://pdfs.semanticscholar.org/e219/6b47133c2191d380098744c13ba77133e625.pdf
–Read Kavanaugh text mining story: Text analysis of Brett Kavanaugh’s opinion.
http://www.storybench.org/bringing-textual-analysis-tools-to-judge-brett-kavanaughs-latest-opinion/
Samantha Sunne, “The Challenges and Possible Pitfalls of Data Journalism, and How You Can Avoid Them,” American Press Institute, 2016
–Create R Markdown document, export to PDF, HTML
StackOverflow
https://stackoverflow.com/questions/46691933/r-sort-by-year-then-month-in-ggplot2
Grammar of Graphics http://vita.had.co.nz/papers/layered-grammar.html
–Joining Dataframes in R
https://www.youtube.com/watch?v=gLg4D9bMIyc&t=13s
–Data Wrangling http://learn.r-journalism.com/en/wrangling/
http://learn.r-journalism.com/en/wrangling/dplyr/dplyr/
https://github.com/r-journalism/learn-chapter-3/blob/master/dplyr/pipes-dplyr.R
Notes
--The pie chart focuses the reader on large percentages, and encourages the reader to think of the total
--The stacked bar plot provides the same information, but makes it easier to accurately determine at a glance how large each group is out of the whole.
--This bar chart splits the categories horizontally, and draws attention to how the family members are ordered. It encourages the reader to think about the distribution rather than disconnected categories, and gives a better sense of sense of scale.
Reading Machlis Chs. 13 & 14.
Resources:
–
Seth C. Lewis, et al. “Big Data and Journalism: Epistemology, Expertise, Economics and Ethics,” Digital Journalism, 2015
–Review another R tutorial https://docs.google.com/presentation/d/1zICxR7qDM3RQ2Nxi5CqHlM3H8I7qoVkNtqcNcnbbDCw/edit#slide=id.p
RStudio Navigation Tricks You Might’ve Missed https://rviews.rstudio.com/2016/11/11/easy-tricks-you-mightve-missed/
How Do I? https://smach.github.io/R4JournalismBook/HowDoI.html
Functions https://smach.github.io/R4JournalismBook/functions.html
Packages https://smach.github.io/R4JournalismBook/packages.html
–Basic descriptive statistics —Review ComputerWorld’s Beginner’s Guide To R –Stack Overflow at stackoverflow.com
–String data manipulation https://dereksonderegger.github.io/570L/13-string-manipulation.html
–Follow StoryBench, Northeastern Univ. https://twitter.com/storybench
Resources
–Use R instead of Excel: Andrew Ba Tran
Excellent Tutorial Spelling out Excel and Comparable Commands in R
https://trendct.org/2015/06/12/r-for-beginners-how-to-transition-from-excel-to-r/
Basic data work- head to http://bit.ly/excel_and_r
–All Cheat Sheets https://www.rstudio.com/resources/cheatsheets/
–
Coulter Tweet Analysis #2 Exercise https://bit.ly/2TdY9Mv
More on this topic!
Comparing AOC and Coulter tweets
Twitter exploration exercise https://bit.ly/2Sqn1j1
Return to Twitter Engagement
http://bit.ly/2GParD5
Twitter historical API
https://developer.twitter.com/en/docs/tutorials/choosing-historical-api
–AOC Twitter feed
https://bit.ly/2Sqn1j1
–Discuss Twitter Metadata
–Work with sample Twitter data
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object
–Study Twitter meta data
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html
–Look at this example: Ocasio.csv (in data folder of course page) –Twitter analysis of Trump Tweets http://varianceexplained.org/r/trump-tweets/
–Show Collins results
–Twitter analysis of Trump Tweets http://varianceexplained.org/r/trump-tweets/
Homework Over Spring Break
R_Homework <- GiveMeABreak(homework = 0), c("enjoy yourself", "make good life choices", "call your parents")
–